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Abstract 

Background: It has been known for over a decade that Plosmodium falciparum proteins are enriched in non- 
globular domains of unknown function. The potential for these regions of protein sequence to undergo high levels 
of genetic drift provides a fundamental challenge to attempts to identify the molecular basis of adaptive change in 
malaria parasites. 

Results: Evolutionary comparisons were undertaken using a set of forty P. falciparum metabolic enzyme genes, 
both within the hominid malaria clade {P. reichenowi) and across the genus {P. chabaudi). All genes contained 
coding elements highly conserved across the genus, but there were also a large number of regions of weakly or 
non-aligning coding sequence. These displayed remarkable levels of non-synonymous fixed differences within the 
hominid malaria clade indicating near complete release from purifying selection (dN/dS ratio at residues non- 
aligning across genus: 0.64, dN/dS ratio at residues identical across genus: 0.03). Regions of low conservation also 
possessed high levels of hydrophilicity, a marker of non-globularity. The propensity for such regions to act as 
potent sources of non-synonymous genetic drift within extant P. falciparum isolates was confirmed at 
chromosomal regions containing genes known to mediate drug resistance in field isolates, where 150 of 153 
amino acid variants were located in poorly conserved regions. In contrast, all 22 amino acid variants associated 
with drug resistance were restricted to highly conserved regions. Additional mutations associated with laboratory- 
selected drug resistance, such as those in PfATPase4 selected by spiroindolone, were similarly restricted while 
mutations in another calcium ATPase (PfSERCA, a gene proposed to mediate artemisinin resistance) that reach 
significant frequencies in field isolates were located exclusively in poorly conserved regions consistent with genetic 
drift. 

Conclusion: Coding sequences of malaria parasites contain prospectively definable domains subject to neutral or 
nearly neutral evolution on a scale that appears unrivalled in biology. This distinct evolutionary landscape has 
potential to confound analytical methods developed for other genera. Against this tide of genetic drift, 
polymorphisms mediating functional change stand out to such an extent that evolutionary context provides a 
useful signal for identifying the molecular basis of drug resistance in malaria parasites, a finding that is of relevance 
to both genome-wide and candidate gene studies in this genus. 



Background 

Identifying the molecular basis of disease-causing traits 
is one of the major justifications for the recent expan- 
sion in genomic data covering a wide range of taxa. 
Nowhere is this goal more clearly defined than in the 
case of malaria, where adaptive evolution in the form of 
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drug resistance continues to undermine efforts to con- 
trol human disease caused by P. falciparum [1] and P. 
vivax [2], Understanding the molecular basis of resis- 
tance phenotypes is of great operational importance as 
markers can be used to monitor spread and alternative 
therapeutic strategies can be designed. Establishment of 
a genetic cross has proven a fruitful starting point for 
determination of the genotypic basis of drug resistance 
[3-5], but is difficult, expensive and hence rarely 
achieved in practice. Genomic epidemiological 
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approaches currently represent a promising route for- 
ward allowing detection of signatures of selection asso- 
ciated with drug resistance [6], although this approach 
relies on identification of linkage disequilibrium which 
is known to be of variable strength [7]. Candidate gene 
approaches based on other forms of data analysis may 
also generate hypotheses suitable for further testing 
[8,9]. 

Despite the power of these approaches for the discov- 
ery of new drug-resistance genotypes, a decade has 
passed since the description of the last confirmed resis- 
tance gene for a widely used antimalarial [4]. It remains 
exceptional for mutations hypothesized as being 
involved in drug-resistance to be validated by transfec- 
tion and phenotypic testing in P. falciparum, steps that 
allow polymorphisms of true adaptive value to be discri- 
minated from those that are present simply as a result 
of genetic drift. 

A specific phenomenon that may complicate studies in 
this genus relates to the remarkable degree to which 
Plasmodium proteins are enriched in non-globular 
domains [10]. Since their first systematic description 
more than a decade ago, the function of these domains 
has remained unresolved [11-13] with one possibility 
being that they represent downstream consequences of 
events at the DNA level (i.e. broadly neutral in func- 
tional terms). If this were to be the case, they would 
represent potent generators of neutral or near neutral 
non-synonymous polymorphism in line with central 
concepts of modern evolutionary biology [14-16] that 
will inevitably confound both genome-wide and candi- 
date approaches to identifying the molecular basis of 
resistance. Notably, minimal attempts have been made 
to determine systematically the degree to which specific 
areas of coding sequence have become released from 
negative selective in the long-term, and by inference 
unlikely to mediate functional change in the short-term. 
Measurement of selective forces is still generally under- 
taken at a whole-gene level, an approach that falls down 
if pressures vary considerably within individual genes, 
preventing determination of the true baseline for varia- 
tion which is neutral evolution within and between 
malaria species. 

We reasoned that application of two basic biological 
concepts could improve significantly the accuracy of 
approaches designed to identify the genetic basis of 
drug-resistance. Firstly, drug resistance, by necessity, 
requires a significant functional change in the parasite, 
and hence a functional change in one or more proteins. 
Secondly, functionally important regions of proteins are 
generally conserved across wide distances of evolution, 
and can therefore be discriminated in orthologous 
sequence alignments from sequences under minimal 
constraint [17]. 



We undertook a systematic study of evolutionary pro- 
cesses in Plasmodium coding sequences, revealing wide- 
spread, dynamic variation in selection pressures within 
individual coding sequences consistent with neutral the- 
ories of evolution. The use of underlying conservation 
score as an independent means of identifying functional 
variation against the backdrop of genetic drift was vali- 
dated and applied by examining the molecular basis of 
drug-resistance, a phenotype that, by necessity, requires a 
significant functional change in the parasite, and hence 
frequently a functional change in one or more proteins. 

Results 

Polymorphism and divergence: whole-gene level 

In order to study short-term evolutionary processes 
within the hominid malaria clade a reference set of 40 
P. falciparum genes (Table 1) was identified, consisting 
of sequences well covered by genome sequencing of the 
closely related primate parasite P. reichenowi [18]. Sin- 
gle-nucleotide changes in these genes associated with 
divergence (between P. falciparum 3D7 strain and P. 
reichenowi) and intra-species polymorphism (within 
sequenced P. falciparum isolates) were quantified. The 
40-gene reference set contained 1270 fixed synonymous 
differences between P. falciparum and P. reichenowi 
indicating a synonymous pairwise divergence (dS) of 
6.05%, a figure consistent with previous estimates [19]. 
There were 795 non-synonymous fixed differences (dN 
= 0.92%) indicating an overall dN/dS ratio of 0.17, con- 
sistent with broad negative selection across these genes. 
This compares to a genome-wide assessment of fixed 
differences between P. falciparum and P. reichenowi, 
dN/dS = 0.21, and indicates that the sequences chosen 
broadly capture the range of dN/dS ratios encountered 
in the P. falciparum genome [18]. Within sequenced P. 
falciparum isolates there were 83 synonymous and 102 
non-synonymous polymorphisms (dN/dS ratio 0.31). 
Non-synonymous changes hence constitute 55.1% of 
intraspecies polymorphism but only 38.5% of interspe- 
cies divergence, suggesting that approximately one-third 
of observed amino acid variation within sequenced P. 
falciparum isolates is deleterious and would naturally 
undergo purifying selection over a longer period. 

Conservation across the Plasmodium genus 

Study of conservation across the Plasmodium genus was 
undertaken using orthologous protein sequences from 
the rodent parasite P. chabaudi, as genome sequencing 
of P. chabaudi was at a more advanced stage at the 
time of the studies than other available rodent species. 
P. falciparum (3D7) protein sequences were compared 
with P. chabaudi) an average score for all hominid 
malaria variants would strictly be more accurate, 
although this would only influence approximately 1% of 
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Table 1 Reference set of genes 

P. falciparum gene Other identifier Product Pf amino acids P. chabaudi gene Pc amino acids 

ATPases 

PFA0300C MALI P1. 52 

PFC0840w PfATPase7 

PFE0195w PfATPase3 

PF07_0047 None 

PF08_01 13 None 

PFI0240C None 

PFL0590C PfATPase4 

PFL1125W MALI 2P1. 225 



Transporters 












PFB0210C 


PfHTI 


Hexose transporter 


504 


PCHAS_030470 


505 


PFB0465C 


PF02_0097 


Monocarboxylate transporter 


457 


PCHAS_030940 


456 


PFE1 185w 


MAL5P1.237 


Metal transporter 


684 


PCHASJ 23900 


704 


PFF0450C 


MAL6P1.94 


Zn or Fe transporter 


361 


PCHAS_010830 


338 


PFF0170w 


MAL6P1.38 


Calcium antiporter 


441 


PCHAS_010300 


440 


PFF1430C 


MAL6P1.133 


Amino acid transporter 


606 


PCHASJ 12780 


617 


PFI1 295c 


None 


Monocarboxylate transporter 


529 


PCHAS_082700 


507 


PF1 1 _02 1 0 


None 


Mg 2+ , Co 2+ & Ni 2+ channel 


529 


PCHAS_091640 


435 


PF1 1_0338 


PfAQP 


Aquaglyceroporin 


258 


PCHAS_093040 


258 


MAL13P1.23 


None 


CorA-like Mg 2+ transporter 


468 


PCHASJ 40460 


491 


PF13_0252 


PfNTI 


Nucleoside transporter 


422 


PCHASJ 36470 


414 


PF14_0679 


None 


Inorganic anion antiporter 


664 


PCHASJ 33900 


700 


Glycolytic 












PFF1 155w 


MAL6P1.189 


Hexokinase 


493 


PCHASJ 1 2240 


494 


PF14_0341 


None 


Glucose-6-phosphate isomerase 


579 


PCHASJ 00870 


578 


PFI0755C 


None 


6-Phosphofructokinase 


1418 


PCHAS_081620 


1306 


PF14_0425 


None 


Fructose-bisphosphate aldolase 


369 


PCHASJ 31 180 


369 


PF14_0378 


None 


Triosephosphate isomerase 


248 


PCHASJ 30700 


248 


PFI1105W 


PGK 


Phosphoglycerate kinase 


416 


PCHAS_082320 


416 


PF11_0208 


None 


Phosphoglycerate mutase 


250 


PCHAS_091620 


250 


PF10_0155 


None 


Enolase 


446 


PCHASJ 21 500 


446 


PFF1300w 


MAL6P1.160 


Pyruvate kinase 


511 


PCHASJ 125 10 


511 


PF13_0141 


PfLDH 


L-lactate dehydrogenase 


316 


PCHASJ 34470 


316 


DNA/RNA 












PFB0730w 


PF02_0151 


DEAD/DEAH box helicase 


1997 


PCHAS_031480 


1404 


PFC0805W 


MAL3P6.20 


DNA-directed RNA polymerase II 


2457 


PCHAS_080680 


2307 


PFE0465C 


MAL5P1.95 


RNA polymerase 1 


2914 


PCHASJ 10870 


2570 


PFE0715W 


MAL5P1.144 


Asparagine-tRNA ligase 


1128 


PCHASJ 11 360 


1084 


PFF1095W 


MAL6P1.201 


Leucyl tRNA synthase 


1447 


PCHASJ 12120 


1251 


MAL7P1.145 


None 


Mismatch repair protein pmsl 


1330 


PCHAS_020880 


1094 


PF13_0251 


None 


DNA topoisomerase III 


710 


PCHASJ 36460 


676 


PF14_0234 


None 


DNA-directed DNA polymerase 


1236 


PCHASJ 01 890 


989 


PF14_0316 


None 


DNA topoisomerase II 


1472 


PCHASJ 01 120 


1460 


PF14_0695 


None 


DNA-directed RNA polymerase 


861 


PCHASJ 34050 


796 



Vacuolar ATPase-coupled proton transport 
Phospholipid transporting ATPase 
Cation transporting ATPase 
Cell-division cycle ATPase 
Vacuolar proton transporting ATPase 
Cu 2+ - transporting ATPase 
Non-SERCA type Ca 2+ -transporting ATPase 
Phospholipid transporting ATPase 



383 PCHAS_020560 383 

1864 PCHAS_080610 1773 

2393 PCHASJ 10330 1904 

1229 PCHASJ 22230 1106 

1053 PCHASJ 22440 957 

2563 PCHAS_041740 2128 

1208 PCHAS_020540 1468 

1618 PCHASJ 44030 1601 



residues. The mean level of protein conservation per 
gene ranged widely across the reference set of 40 house- 
keeping genes (mean +4.88 to -0.47 based on the BLO- 
SUM62 matrix). It was also noted that overall 20.1% of 
P. falciparum residues were non-aligned, compared to 
13.3% for P. chabaudi, consistent with expansion of 



non-aligned sequence in the hominid malaria lineage 
since the last common ancestor of these species. 

Correlation between short and long-term conservation 

At the whole-gene level the dN/dS ratio for P. falci- 
parum - P. reichenowi divergence strongly correlated 
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with the mean cross-genus conservation score (Figure 
la, r 2 = 0.69, p < 0.0001); as expected dS showed no 
such relationship. For polymorphism within P. falci- 
parum (where interpretation of dN/dS is less straightfor- 
ward [20]), non-significant trends of the same form were 
seen (Figure lb). Gene function also appeared to be 
associated with the degree of purifying selection, with 
glycolytic enzymes showing lower dN/dS values than the 
other groups (Figure lc) consistent with previous work 
[18]. We compared our dN/dS scores for P. falciparum/ 
reichenowi divergence (Nei-Gojobori method with Jukes- 
Cantor correction) with those previously reported for 
the same 40 genes using maximum likelihood for calcu- 
lation of dN/dS [18]. The two sets of results were well 
correlated (Spearman r = 0.96) indicating that in relative 
terms dN/dS is very similar for the differing methods; it 
also confirms that the fixed difference datasets between 
the two studies are highly comparable. The slope of the 
line of correlation between our results and those of Jef- 
fares et al. [18] was 0.81 suggesting that the Nei- 



Gojobori analysis overestimates dN/dS to a small degree 
compared to PAML. 

A stratified analysis was undertaken in which each 
amino acid residue was assigned to one of four levels of 
cross-genus evolutionary conservation according to P. 
falciparum - P. chabaudi CLUSTALW alignment. Resi- 
dues were defined as non-aligned, non-conserved (-4 to 
-1), semi-conserved (0 to +3) and identical (+4 and 
above) on the basis of the BLOSUM62 matrix. Synon- 
ymous substitution rates within the hominid malaria 
clade were similar across all levels of conservation (Fig- 
ure 2a) but non-synonymous substitution rates were 
strongly influenced by long-term evolutionary context 
(Figure 2b). The dN/dS ratio for divergence was 
approximately 20-fold higher at non-aligning residues 
(0.64) compared to residues identical across the genus 
(0.034; Table 2). Chi-squared testing comparing diver- 
gence with polymorphism (McDonald-Kreitman test) 
within each conservation stratum showed that strong 
purifying selection was largely confined to residues 
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Figure 1 Hominid malaria clade divergence and conservation across the Plasmodium genus. For the forty individual genes of the 
reference set, dN/dS and dS are plotted against mean conservation score for each gene, (a) Fixed differences between P. falciporum and P. 
reichenowi. (b) Polymorphisms within P. falciporum. (c) dN/dS ratios for fixed differences between P. falciporum and P. reichenowi according to 
functional group. 
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Figure 2 Mutation rates within the hominid malaria clade stratified by conservation level across the Plasmodium genus, (a) 

Synonymous substitution rates, (b) dN/dS ratios. Data are based on 40 reference genes. Fixed differences between P. folciporum and P. 

reichenowi are indicated by filled grey bars; polymorphism within P. falciparum is indicated by empty bars. Error bars represent 95% confidence 

intervals. Conservation levels were based on BLOSUM62 score at individual amino acids in P. falciparum - P. chabaudi alignments; identical (+4 

and above), semi-conserved (0 to +3), non-conserved (-4 to -1) and non-aligned. 
^ J 



identical across the genus (Table 2), with minimal evi- 
dence for purifying selection in non-aligned sequence 
regions. 

Relationship between evolutionary conservation, 
hydrophilicity and informational complexity 

Given the lack of a gold-standard informatic approach 
to defining non-globular regions, the relationship 
between conservation and non-globularity was explored 
via one purely structural marker of non-globularity 
(Kyte-Doolittle hydrophilicity) as well as a purely infor- 
mational one (low-complexity regions defined by the 
SEG algorithm). These studies were undertaken in the 



context of an approach to cross-genus conservation 
measurement designed for high throughput use, invol- 
ving a sliding window of 9 amino acid residues. 

Decreasing cross-genus conservation was linked to 
both a shift to unusually high hydrophilicity scores, with 
a clear inflexion point present between conservation 
scores of 2 and 3 (Figure 3), and a reduction in strength 
of negative selection within the hominid malaria clade 
(Figure 4). A minority (31%) of sequence with conserva- 
tion score less than 2.5 was responsible for 77% of non- 
synonymous differences between P. falciparum and P. 
reichenowi, including 87% of radical substitutions (Fig- 
ure 4c). Low-complexity regions defined by the SEG 



Table 2 Divergence and polymorphism in reference genes assessed by McDonald-Kreitman test, stratified by level of 
cross-genus conservation 



Conservation level 


dN/dS 


Adjusted dN/dS 




Synonymous 


Non-synonymous 


Chi-squared 


Neutrality Index 


Identical 


0.030 


0.034 


D 


839 


101 


8.3 x 10" 15 


5.6 








P 


52 


35 






Semi-conserved 


0.21 


0.24 


D 


186 


174 


0.22 


1.6 








P 


12 


18 






Non-conserved 


0.35 


0.39 


D 


90 


118 


0.38 


1.5 








P 


7 


14 






Non-aligned 


0.55 


0.64 


D 


155 


402 


0.74 


1.1 








P 


12 


35 






Total 


0.15 


0.17 


D 


1270 


795 


9.5 x 10~ 6 


2.0 








P 


83 


102 







dN/dS ratios refer to P. falciparum IP. reichenowi divergence. Conservation levels were based on BLOSUM62 score in P. falciparum - P. chabaudi alignments; 
identical (+4 and above), semi-conserved (0 to +3), non-conserved (-4 to -1) and non-aligned. Adjusted dN/dS ratios were derived based on the transition: 
transversion ratio measured at 4-fold degenerate sites (see Table 4). D = divergent mutations, P = polymorphisms. Chi-squared and Neutrality Index are the 
outputs of the McDonald-Kreitman test; neutrality index indicates the extent to which the levels of variation depart from the expected in the neutral model with 
values greater than 1 indicating an excess of polymorphism and hence negative selection. 
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Figure 3 Density plot of conservation vs. hydrophilicity for amino acid residues of 40 reference genes, (a) Number of occurrences at 
each co-ordinate (using bins of 0.1 on both x and Y-axes). (b) Average hydrophilicity (conservation score bins of 0.2). Conservation levels in P. 
falciparum - P. chabaudi alignments were based on BLOSUM62 score using a sliding, overlapping window of 9 residues; hydrophilicity was based 
on the Kyte-Doolittle index (sliding, overlapping window of 14). 



algorithm (default parameters) made up 11.9% of the 
sequence studied (missing many poorly conserved 
regions of sequence and including some highly con- 
served areas), and contained only 19.0% of non-synon- 
ymous differences between P. falciparum and P. 
reichenowi. Taken together these observations indicate 
that a measurement of cross-genus evolutionary conser- 
vation identifies highly hydrophilic sequences that show 
relaxed purifying selection in the short-term, features of 
non-globular domains, and that this method appears 
superior to an approach based purely on informational 
complexity. 

Chromosomal regions containing genes responsible for 
drug resistance in field isolates 

In order to test the utility of this approach for detect- 
ing functional variation in P. falciparum, we examined 
three chromosomal regions containing PfCRT, 
PfDHFR and PfDHPS, testing polymorphisms between 
two pairs of sensitive and resistant parasite isolates in 
each case. As observed with the reference genes, there 



were wide ranging levels of long-term conservation 
(Figure 5), and a similar shift to very high levels of 
hydrophilicity at conservation scores below 3 in all 
three cases (Figure 6). Indeed, the proportion of 
sequence falling below a conservation threshold of 2.5 
was even greater than in the reference gene set (56% 
vs. 31%) indicating that at a genomic level non-globu- 
lar domains may be responsible for an even higher 
proportion of non-synonymous polymorphism than in 
our reference genes. Consistent with this, amino acid 
variants in genes not involved in drug resistance fell 
almost exclusively in less conserved regions with 150 
of 153 such amino acid variants below the 2.5 conser- 
vation threshold (Figure 5g, Chi-squared test for pro- 
portion of mutations vs. proportion of total sequence 
at each conservation level: p = 9 x 10" 23 ). In contrast 
22 amino acid variants within known drug-resistance 
genes were located in more conserved regions (Chi- 
squared test for proportion of mutations vs. propor- 
tion of total sequence at each conservation level: p = 
3.8 x 10" 7 ), with all mutations being located at 
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Figure 4 Conservation-hydrophilicity at sites of amino acid variants in 40 reference genes, (a) Conservative amino acid variants between 
P. folciporum and P. reichenowi. (b) Radical amino acid variants between P. falciparum and P. reichenowi. (c) Proportion of total sequence, 
conservative amino acid variants and radical amino acid variants according to conservation level (conservation score bins of 1.0). Conservation 
and hydrophilicity were derived as for Figure 3. 



positions with relatively high conservation scores (> 
3.0). Further there was no overlap in conservation 
scores between amino acid variants in the drug-resis- 
tance gene and those within other genes within any 
single alignment. These results indicate that the posi- 
tion of mutations within a long-term and measurable 
evolutionary landscape provides a robust signature 
that can aid identification of functionally important 
amino acid variants. 

Laboratory-selected drug resistance 

Fourteen amino acid variants associated with selection 
of atovaquone, azithromycin and spiroindolone resis- 
tance in vitro (involving mitochondrial, apicoplast and 
nuclear genomes respectively [21-23]) conformed to the 
same pattern (Figure 7). Combining these with field 
resistance mutations produced a set of 36 non- 



synonymous polymorphisms associated with drug resis- 
tance. Notably, among these mutations the nature of the 
amino acid variants itself (radical vs. conservative) pro- 
vided no discriminatory value, indicating that it is pri- 
marily the location of the protein change that 
distinguishes functional adaptation. 

Population data from P. falciparum isolates 

Evolutionary analysis of long-term conservation is also 
relevant to population studies on large numbers of field 
samples since the neutral theory of evolution describes a 
strong relationship between selective pressure and allele 
frequency [24]. Sequences of PfSERCA, a postulated tar- 
get of artemisinin, have been determined for thousands 
of isolates across the world (See Table 3 for data 
sources). Consistent with neutral evolution, maximum 
allele frequencies for amino acid variants in PfSERCA 
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Figure 5 Relationship between Plasmodium genus conservation score and amino acid variation in P. falciparum drug-resistance 
regions. For each of three drug-resistance regions, two comparisons between resistant and sensitive isolates were made, [a, b) CRT resistance 
locus on chromosome 7. (c, d) DHFR resistance locus on chromosome 4. (e, f) DHPS locus on chromosome 8. Mutations in known drug- 
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Gardner et al. BMC Evolutionary Biology 201 1, 1 1 :257 
http://www.biomedcentral.com/1471-2148/11/257 



Page 9 of 16 



PfCRT locus 



8 

CO 



8 

Q 

i 




Trans-genus conservation score 



2.0- 



5 1.5 



Q. 

o 

■O1.0- 



0.0- 



••A 



-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 
Trans-genus conservation score 



DHFR locus 




2.0- 



•■§1.5- 



CL 
O 

-fei.o- 
>% 
-£= 

c 

CD 

^0.5- 



0.0' 



/ V 



Trans-genus conservation score 



-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 
Trans-genus conservation score 



DHPS locus 




— i — i — i — i — i — i — i — i — i — i — i — — i — i — i — i — i — i — i — i — i — i — i — i — i 

-3 -2 -1 0 1 2 3 4 5 6 7 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 

Trans-genus conservation score Trans-genus conservation score 



Figure 6 Density plot of conservation vs. hydrophilicity for 3 drug resistance regions. Left: Number of occurrences at each co-ordinate 
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derived as for Figure 3. 
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were linked closely to cross-genus conservation level. All 
mutations reaching a prevalence of more than 25% were 
located in regions of low conservation (Figure 8a), 
matching the pattern of fixed differences between P. fal- 
ciparum and P. reichenowi SERCAs. The opposite 



pattern was seen with amino acid variants associated 
with spiroindolone resistance in PfATPase4, which stand 
out distinctly against the flow of corresponding amino 
acid variants between P. falciparum and P. reichenowi 
ATPase4 sequences (Figure 8b). 
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Figure 7 Conservation-hydrophilicity plot for amino acid 
variants associated with drug resistance or within drug- 
resistance regions. All mutations shown to be involved in drug 
resistance by transfection (22 nSNPs in CRT, DHFR and DHPS) or 
drug pressure studies (14nSNPs in cytochrome b, rpl4 and 
PfATPase4) are shown along with variants in chromosomal regions 
around CRT, DHFR and DHPS (see Figure 5). 



Table 3 Maximum field allele frequencies for non- 
synonymous mutations in PfSERCA (PFA0310c, PfATP6) 



Discussion 

Non-adaptive evolution in Plasmodium 

We show that genes that maintain synteny and clearly 
defined orthologous relationships across the genus also 
contain poorly conserved domains that occupy approxi- 
mately half of the total coding sequence when analysed 
at the chromosomal level. In other words, in Plasmo- 
dium species, negative (purifying) selective force is unu- 
sually chimeric in nature, being alternately strong and 
weak within the same gene. Plasmodium chabaudi has 
approximately two-thirds as much non-aligned sequence 
as P. falciparum, while other work on a genome-wide 
scale indicates that P. vivax has approximately 40% [25] . 
This indicates that the phenomenon is genus-wide, 
being particularly marked in the hominid malaria clade. 
The reason why internal proteins mediating core biolo- 
gical functions possess such widespread areas of non- 
globular domain experiencing minimal purifying selec- 
tion remains obscure but our data are certainly consis- 
tent with these sequences representing a downstream 
consequence of genetic processes [13,26] such as repli- 
cation slippage and unequal crossing over [27]. The dN/ 
dS ratio of 0.64 in non-aligned regions suggests that 
these sequences are under very weak negative selective 
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Table 3 Maximum field allele frequencies for non-synon- 
ymous mutations in PfSERCA (PFA0310c, PfATP6) 

(Continued) 
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3D7 sequence was used as reference. Only studies where alleles were 
determined for at least ten samples were included. Samples with mixed 
infections were included in each numerator. 



pressure, with purifying selection perhaps acting only to 
maintain non-globular structure. 

Applications 

Improved understanding of these issues has major impli- 
cations for the study of all Plasmodium biology that 
relates to evolution. The findings are most immediately 
relevant to the goal of understanding molecular 
mechanisms of drug resistance, since this represents a 
classical example of evolution in action. As predicted, 
we were able to show that mutations selected by drug 
pressure in the field or laboratory are located in con- 
served protein sequences that have remained largely 
unchanged for millions of years (being conserved across 
the Plasmodium genus) by virtue of their functional 
importance, as already noted by other authors 
[22,28-30]. What has not been considered is that the 
gradient in purifying selective force across the genus is 
strong enough to allow drug-resistance mutations to be 
discriminated accurately from those that are likely to 
non-adaptive in nature. Prediction of the functional con- 
sequences of mutations based on long-term evolutionary 
conservation has been described in the context of 
human genetic studies [31-33] but there is even greater 
scope for this approach to assist studies in P. falci- 
parum, where the contribution of non-adaptive change 
appears to be log-orders of magnitude higher. 

The potential for non-adaptive evolution to confound 
studies of genotype-phenotype relationships is clear in 
several areas. Candidate gene surveys based on sequen- 
cing of field isolates, where few or no phenotypic data 
are available, provide an obvious example; the majority 
of polymorphisms found in PfSERCA are seen to be 
located in poorly conserved regions, including all those 
reaching frequencies of more than 25% in field isolates. 
This finding indicates that polymorphism in this gene is 
dominated by non-adaptive evolution, and the tempta- 
tion to invoke positive selection [34] should be avoided. 
Use of dN/dS to infer positive selection in whole gen- 
ome analyses of P. vivax [35] may also be prone to the 
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Figure 8 Conservation-hydrophilicity plots and single- 
nucleotide changes in the hominid malaria clade in calcium- 
ATPase genes, (a) PfSERCA (PFA0310c) showing amino acid 
variants between P. falciparum and P. reichenowi and field amino 
acid variants (see Table 3); symbol size indicates maximum allele 
frequency on a countrywide basis, (b) PfATPase4 (PFL0590c), 
showing amino acid variants between P. falciparum and P. 
reichenowi as well as the mutations associated with spiroindolone 
selection (see text). Conservation and hydrophilicity are derived as 
for Figure 3. 



same issue; although measures of dN/dS across a gene 
tend to be conservative due to purifying selection, it is 
still likely on statistical grounds that across a large num- 
ber of genes, a number of high dN/dS ratios (greater 
than one) will be generated simply by the action of non- 
adaptive evolution. 

Understanding of the degree to which non-adaptive 
evolution occurs is also relevant to the identification of 
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drug-resistance genes in a chromosomal context. In our 
analyses of regions known to contain drug-resistance 
genes, application of evolutionary conservation added 
considerable specificity with more than 95% of muta- 
tions outside drug-resistance genes falling into poorly 
conserved sequence. Nevertheless the accuracy of the 
evolutionary approach as it stands is unlikely to be suffi- 
cient for its use in isolation, but rather as a method for 
pinpointing polymorphisms within regions identified by 
other means, particularly genome-wide association stu- 
dies using population genetic tests [6,36]. Genetic cross 
experiments are also a robust starting point; for exam- 
ple, the complex polymorphisms in the cgl and cg2 
genes seen at the chloroquine resistance locus identified 
by the cross of HB3 and DD2 [37] would be defined as 
being of low priority for further study by the approach 
we describe. Further work is in progress to examine sys- 
tematically the evolutionary properties of other regions 
already identified as being of phenotypic importance by 
experimental means [38], with the aim of prioritizing 
polymorphisms for more detailed studies. 

Wider relevance 

Awareness of the chimeric selection landscape of Plas- 
modium coding sequences is also relevant to the study 
of the history of Plasmodium species. A higher dN/dS at 
the whole gene level for hominid parasites P. falciparum 
and P. reichenowi compared to rodent malaria parasites 
has been suggested to provide evidence for small popu- 
lation size during the P. falciparum-reichenowi diver- 
gence [39]. However, based on the set of forty reference 
genes we studied, hominid malaria proteins contain sig- 
nificantly more non-aligned sequence than do rodents, 
likely leading to higher dN/dS ratios at the whole gene 
level. In fact in sequence regions conserved across the 
Plasmodium genus the average dN/dS ratio for the P. 
falciparum-P. reichenowi divergence was 0.034, indicat- 
ing strong purifying selection and implying a historically 
large population size following the lateral transfer of P. 
falciparum to humans from bonobos [40] or gorillas 
[41]. 

Conclusion 

The coding sequences of Plasmodium species exemplify 
the theories of neutral evolution [14,42] and the ensuing 
nearly neutral theory [15], operating on a grand scale 
that to our knowledge is unrivalled in genome biology. 
Failing to take into account the effect of this in any 
work that relates to evolution of coding sequence repre- 
sents a missed opportunity to distinguish adaptive from 
non-adaptive polymorphism. For example neither gen- 
ome-wide or candidate gene association studies into 
antimalarial resistance currently take into account the 
fact that most polymorphism is non-adaptive in nature. 



The approach we describe takes advantage of protein 
sequence data, but does not require any knowledge of 
protein function, a major advantage since the elucida- 
tion of function for the majority of Plasmodium genes 
remains a distant goal. Application of this method on a 
genome-wide scale, where it can be integrated into 
DNA-based methods as well as candidate gene 
approaches, offers a powerful approach for future stu- 
dies in the field of antimalarial resistance, as well as 
other areas of research that are linked to evolutionary 
biology in this important genus. 

Methods 

Comparisons between P. falciparum and P. chabaudi 
orthologs 

The rodent malaria species Plasmodium chabaudi was 
used as comparator species given its status in terms of 
genome completion. Protein and DNA sequences (Plas- 
modb version: 2009.03.24) were obtained from Plas- 
modb.org. Forty orthologous pairs of sequences were 
chosen, fulfilling the following criteria; no known or 
hypothesised role in drug resistance or host interaction, 
syntenic relationship between P. falciparum and P. cha- 
baudi and > 75% coverage in the P. reichenowi ortholog 
[18] (used for measurement of divergence within homi- 
nid malaria parasites). Since sexual-stage genes are 
released from purifying selection in asexual culture 
(experienced by several of the isolates under study) [43] 
genes with no evidence of asexual expression in tran- 
scriptomic surveys [44,45] were also excluded. In order 
to reflect the types of genes which are implicated in 
drug resistance, as well as to obtain a range of conserva- 
tion levels across this reference set, the reference genes 
consisted of ATPases (8), secondarily active transporters 
(12), glycolytic enzymes (10) and enzymes involved in 
DNA and RNA processing (10) (Table 1). The Plasmodb 
gene model for PfL0590c (PfATPase4) was found to be 
inconsistent with published data based on cDNA; the 
latter were used as coding sequence [23]. 

CLUSTALW alignments of orthologous protein 
sequences from P. falciparum and P. chabaudi were per- 
formed using the default settings of BioEdit. BLOSUM62 
scores (reflecting conservation) were then calculated for 
each P. falciparum residue. Regions that could not be 
aligned between P. falciparum and P. chabaudi orthologs 
were defined as the gaps in BLASTP alignments of P. fal- 
ciparum and P. chabaudi orthologs (BLOSUM62 matrix, 
gap penalties: existence 9, extension 2) A manual check 
of the protein alignment in BioEdit was also performed 
and on rare occasions where short alignments had been 
excluded by the BLASTP search these were retained. For 
residues where there was no aligned P. chabaudi residue, 
a conservation score of -5 was applied. Relatively small 
regions with no P. reichenowi coverage were also 
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removed from analysis to ensure comparable denomina- 
tors for inter- and intra-species comparison. 

Annotation of polymorphisms and fixed differences 

We analysed single-nucleotide polymorphisms (SNPs) 
derived from Plasmodb, based on available sequence for 
various P. falciparum strains from around the world 
generated by the Broad Institute [46], Wellcome Trust 
Sanger Institute [18] and NIH [47]. Fixed differences 
between P. falciparum 3D7 strain and P. reichenowi 
(Oscar strain) were also obtained from Plasmodb [18]. 
Radical amino acid substitutions were defined as those 
with BLOSUM62 matrix score < 0. 

The challenge of identifying single nucleotide changes 
within a sequence that is undergoing frequent insertion- 
deletion polymorphism has been described [48] . In addi- 
tion, we noted that although complex polymorphisms 
were said to be excluded in publications, the Plasmodb 
lists of SNPs within P. falciparum and fixed differences 
between P. falciparum 3D7 strain and P. reichenowi 
sometimes contained repetitive mutations within tandem 
repeats (confirmed by Pustell protein matrix, Mac Vector) 
that were clearly part of complex indel polymorphisms 
and hence not genuine SNPs. These regions contributed 
27.5% of all SNPs among P. falciparum isolates and 9.4% 
of the interspecies divergence, and were excluded from 
both polymorphism and divergence analyses. 

Calculation of positions and transitions 

Calculation of synonymous and non-synonymous posi- 
tions was undertaken for each P. falciparum ortholog 
using a standard substitution matrix (assuming equal 
mutation rates) with Jukes-Cantor correction [49]. Con- 
fidence intervals for dS were determined assuming a 
continuous distribution. Confidence intervals for dN/dS 
were determined using the delta method. Transitional 
bias was determined by studying synonymous fixed dif- 
ferences between P. falciparum and P. reichenowi ortho- 
logs occurring at amino acids encoded by four codons. 
Consistent with previous measurements on a chromoso- 
mal scale [50], synonymous sites made up 20.1% of all 
sites within the reference genes (See Table 4) with 8.9% 
of sites being 4-fold degenerate sites (nucleotide 



positions at which all mutations are synonymous). For 
synonymous differences at 4-fold degenerate sites, which 
we assume are selectively neutral, transitions made up 
41.4% of changes and transversions 58.6%, consistent 
with a moderate transitional bias that would produce 
falsely low dN/dS ratios [51] (since transitional muta- 
tions are associated with degeneracy at many 2-fold 
degenerate sites). Taking this factor into account led to 
upwards revision of dN/dS ratios for divergence of 
between 12 and 16% according to the level of conserva- 
tion; the effect was greatest for non-aligned sequence 
where the adjusted dN/dS ratio rose to 0.64 (Table 1). 

Studies of hydrophilicity and complexity 

Hydrophilicity scores were measured by the Kyte-Doo- 
little index (window = 14). Low-complexity regions were 
defined using the SEG algorithm at its default para- 
meters [52]. 

Drug-resistance chromosomal regions 

P. chabaudi orthologs were again used to generate the 
conservation score. In the case of one gene (MAL8P1.111) 
there was no rodent malaria parasite ortholog as pre- 
viously reported [53] and in consequence the syntenic P. 
vivax ortholog was used for comparison. For the single 
apicoplast gene rpl4 the partial P. chabaudi sequence 
PC10361 1.00.0 was available for generation of the cross- 
genus conservation score at sites of mutation. For all stu- 
dies at drug-resistance regions a neighbourhood conserva- 
tion score was used (averaging the individual conservation 
scores across a sliding, overlapping window of 9 residues). 
This allows for the possibility that drug-resistance muta- 
tions may occur at residues that have previously under- 
gone conservative change within a wider area of 
conservation, thereby reducing stochastic loss of sensitiv- 
ity. This also obviated the need for a specific step to iden- 
tify non-aligned regions at genomic regions. Non- 
synonymous SNPs (nSNPs) between sensitive and resistant 
parasites were studied at each locus, spanning the drug- 
resistance gene in each case and extending outwards sym- 
metrically until 10 nSNPs outside the drug-resistance gene 
itself had been documented in at least one pairwise com- 
parison. All residues known to be intrinsic to resistance 



Table 4 Synonymous sites in reference set of genes 



Conservation level 


All 


4-fold 


Transitional sites 


Transversional sites 


Adjusted S 


Identical 


14127 (20.8) 


6384 (9.4) 


6389 (9.4) 


1354 (2.0) 


15537 


Semi-conserved 


2889 (19.1) 


1249 (8.3) 


1319 (8.7) 


321 (2.1) 


3176 


Non-conserved 


1609 (21.5) 


811 (10.9) 


672 (9.0) 


126 (1.7) 


1759 


Non-aligned 


3240 (17.7) 


1187 (6.5) 


1819 (10.0) 


234 (1.3) 


3658 


Total 


21866 (20.1) 


9631 (8.9) 


10200 (9.4) 


2036 (1.9) 


24131 



Transitional sites refer to 2-fold synonymous codons by virtue of transitional substitution; transversional sites to 2-fold synonymous codons by virtue of 
transversional substitution. Numbers in parentheses indicate proportion (%) of total positions. 
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haplotypes, whether or not each individual residue has 
been shown to cause drug-resistance independently, were 
included. Chi-squared testing was undertaken testing 
whether the distribution of amino acid variants in terms of 
conservation level was the same as the distribution of total 
sequence, using 13 conservation levels (bins of 1), and 
hence 12 degrees of freedom. The test was performed first 
for drug-resistance genes and mutations, and then for the 
other genes and the mutations within them. 
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